In this report, I will be exploring the Prosper lender’s data to find insights on the factors the investors are more willing to fund the loans in this data set.
## [1] 113937 81
This data set contains 81 columns and 113,937 rows.
## 'data.frame': 113937 obs. of 83 variables:
## $ ListingKey : Factor w/ 113066 levels "00003546482094282EF90E5",..: 7180 7193 6647 6669 6686 6689 6699 6706 6687 6687 ...
## $ ListingNumber : int 193129 1209647 81716 658116 909464 1074836 750899 768193 1023355 1023355 ...
## $ ListingCreationDate : Factor w/ 113064 levels "2005-11-09 20:44:28.847000000",..: 14184 111894 6429 64760 85967 100310 72556 74019 97834 97834 ...
## $ CreditGrade : Factor w/ 8 levels "A","AA","B","C",..: 4 NA 7 NA NA NA NA NA NA NA ...
## $ Term : int 36 36 36 36 36 60 36 36 36 36 ...
## $ LoanStatus : Factor w/ 12 levels "Cancelled","Chargedoff",..: 3 4 3 4 4 4 4 4 4 4 ...
## $ ClosedDate : Factor w/ 2802 levels "2005-11-25 00:00:00",..: 1137 NA 1262 NA NA NA NA NA NA NA ...
## $ BorrowerAPR : num 0.165 0.12 0.283 0.125 0.246 ...
## $ BorrowerRate : num 0.158 0.092 0.275 0.0974 0.2085 ...
## $ LenderYield : num 0.138 0.082 0.24 0.0874 0.1985 ...
## $ EstimatedEffectiveYield : num NA 0.0796 NA 0.0849 0.1832 ...
## $ EstimatedLoss : num NA 0.0249 NA 0.0249 0.0925 ...
## $ EstimatedReturn : num NA 0.0547 NA 0.06 0.0907 ...
## $ ProsperRating..numeric. : int NA 6 NA 6 3 5 2 4 7 7 ...
## $ ProsperRating..Alpha. : Factor w/ 7 levels "A","AA","B","C",..: NA 1 NA 1 5 3 6 4 2 2 ...
## $ ProsperScore : num NA 7 NA 9 4 10 2 4 9 11 ...
## $ ListingCategory..numeric. : int 0 2 0 16 2 1 1 2 7 7 ...
## $ BorrowerState : Factor w/ 51 levels "AK","AL","AR",..: 6 6 11 11 24 33 17 5 15 15 ...
## $ Occupation : Factor w/ 67 levels "Accountant/CPA",..: 36 42 36 51 20 42 49 28 23 23 ...
## $ EmploymentStatus : Factor w/ 8 levels "Employed","Full-time",..: 8 1 3 1 1 1 1 1 1 1 ...
## $ EmploymentStatusDuration : int 2 44 NA 113 44 82 172 103 269 269 ...
## $ IsBorrowerHomeowner : Factor w/ 2 levels "False","True": 2 1 1 2 2 2 1 1 2 2 ...
## $ CurrentlyInGroup : Factor w/ 2 levels "False","True": 2 1 2 1 1 1 1 1 1 1 ...
## $ GroupKey : Factor w/ 706 levels "00343376901312423168731",..: NA NA 334 NA NA NA NA NA NA NA ...
## $ DateCreditPulled : Factor w/ 112992 levels "2005-11-09 00:30:04.487000000",..: 14347 111883 6446 64724 85857 100382 72500 73937 97888 97888 ...
## $ CreditScoreRangeLower : int 640 680 480 800 680 740 680 700 820 820 ...
## $ CreditScoreRangeUpper : int 659 699 499 819 699 759 699 719 839 839 ...
## $ FirstRecordedCreditLine : Factor w/ 11585 levels "1947-08-24 00:00:00",..: 8638 6616 8926 2246 9497 496 8264 7684 5542 5542 ...
## $ CurrentCreditLines : int 5 14 NA 5 19 21 10 6 17 17 ...
## $ OpenCreditLines : int 4 14 NA 5 19 17 7 6 16 16 ...
## $ TotalCreditLinespast7years : int 12 29 3 29 49 49 20 10 32 32 ...
## $ OpenRevolvingAccounts : int 1 13 0 7 6 13 6 5 12 12 ...
## $ OpenRevolvingMonthlyPayment : num 24 389 0 115 220 1410 214 101 219 219 ...
## $ InquiriesLast6Months : int 3 3 0 0 1 0 0 3 1 1 ...
## $ TotalInquiries : num 3 5 1 1 9 2 0 16 6 6 ...
## $ CurrentDelinquencies : int 2 0 1 4 0 0 0 0 0 0 ...
## $ AmountDelinquent : num 472 0 NA 10056 0 ...
## $ DelinquenciesLast7Years : int 4 0 0 14 0 0 0 0 0 0 ...
## $ PublicRecordsLast10Years : int 0 1 0 0 0 0 0 1 0 0 ...
## $ PublicRecordsLast12Months : int 0 0 NA 0 0 0 0 0 0 0 ...
## $ RevolvingCreditBalance : num 0 3989 NA 1444 6193 ...
## $ BankcardUtilization : num 0 0.21 NA 0.04 0.81 0.39 0.72 0.13 0.11 0.11 ...
## $ AvailableBankcardCredit : num 1500 10266 NA 30754 695 ...
## $ TotalTrades : num 11 29 NA 26 39 47 16 10 29 29 ...
## $ TradesNeverDelinquent..percentage. : num 0.81 1 NA 0.76 0.95 1 0.68 0.8 1 1 ...
## $ TradesOpenedLast6Months : num 0 2 NA 0 2 0 0 0 1 1 ...
## $ DebtToIncomeRatio : num 0.17 0.18 0.06 0.15 0.26 0.36 0.27 0.24 0.25 0.25 ...
## $ IncomeRange : Factor w/ 8 levels "$0","$1-24,999",..: 4 5 7 4 3 3 4 4 4 4 ...
## $ IncomeVerifiable : Factor w/ 2 levels "False","True": 2 2 2 2 2 2 2 2 2 2 ...
## $ StatedMonthlyIncome : num 3083 6125 2083 2875 9583 ...
## $ LoanKey : Factor w/ 113066 levels "00003683605746079487FF7",..: 100337 69837 46303 70776 71387 86505 91250 5425 908 908 ...
## $ TotalProsperLoans : int NA NA NA NA 1 NA NA NA NA NA ...
## $ TotalProsperPaymentsBilled : int NA NA NA NA 11 NA NA NA NA NA ...
## $ OnTimeProsperPayments : int NA NA NA NA 11 NA NA NA NA NA ...
## $ ProsperPaymentsLessThanOneMonthLate: int NA NA NA NA 0 NA NA NA NA NA ...
## $ ProsperPaymentsOneMonthPlusLate : int NA NA NA NA 0 NA NA NA NA NA ...
## $ ProsperPrincipalBorrowed : num NA NA NA NA 11000 NA NA NA NA NA ...
## $ ProsperPrincipalOutstanding : num NA NA NA NA 9948 ...
## $ ScorexChangeAtTimeOfListing : int NA NA NA NA NA NA NA NA NA NA ...
## $ LoanCurrentDaysDelinquent : int 0 0 0 0 0 0 0 0 0 0 ...
## $ LoanFirstDefaultedCycleNumber : int NA NA NA NA NA NA NA NA NA NA ...
## $ LoanMonthsSinceOrigination : int 78 0 86 16 6 3 11 10 3 3 ...
## $ LoanNumber : int 19141 134815 6466 77296 102670 123257 88353 90051 121268 121268 ...
## $ LoanOriginalAmount : int 9425 10000 3001 10000 15000 15000 3000 10000 10000 10000 ...
## $ LoanOriginationDate : Factor w/ 1873 levels "2005-11-15 00:00:00",..: 426 1866 260 1535 1757 1821 1649 1666 1813 1813 ...
## $ LoanOriginationQuarter : Factor w/ 33 levels "Q1 2006","Q1 2007",..: 18 8 2 32 24 33 16 16 33 33 ...
## $ MemberKey : Factor w/ 90831 levels "00003397697413387CAF966",..: 11071 10302 33781 54939 19465 48037 60448 40951 26129 26129 ...
## $ MonthlyLoanPayment : num 330 319 123 321 564 ...
## $ LP_CustomerPayments : num 11396 0 4187 5143 2820 ...
## $ LP_CustomerPrincipalPayments : num 9425 0 3001 4091 1563 ...
## $ LP_InterestandFees : num 1971 0 1186 1052 1257 ...
## $ LP_ServiceFees : num -133.2 0 -24.2 -108 -60.3 ...
## $ LP_CollectionFees : num 0 0 0 0 0 0 0 0 0 0 ...
## $ LP_GrossPrincipalLoss : num 0 0 0 0 0 0 0 0 0 0 ...
## $ LP_NetPrincipalLoss : num 0 0 0 0 0 0 0 0 0 0 ...
## $ LP_NonPrincipalRecoverypayments : num 0 0 0 0 0 0 0 0 0 0 ...
## $ PercentFunded : num 1 1 1 1 1 1 1 1 1 1 ...
## $ Recommendations : int 0 0 0 0 0 0 0 0 0 0 ...
## $ InvestmentFromFriendsCount : int 0 0 0 0 0 0 0 0 0 0 ...
## $ InvestmentFromFriendsAmount : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Investors : int 258 1 41 158 20 1 1 1 1 1 ...
## $ Loan_Yr : Factor w/ 10 levels "2005","2006",..: 3 10 3 8 9 9 9 9 9 9 ...
## $ ListingCategory : Factor w/ 21 levels "N/A","Debt Consolidation",..: 1 3 1 17 3 2 2 3 8 8 ...
This data set originally contain 81 variables from various types ranging from dates to factors. I created 2 more variables: Loan Year and Listing Category.
Number of Investors exhibits a positive skew mean 80.48 > median 44.00. There are more loans with number of investors less than 50.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1000 4000 6500 8337 12000 35000
In the first graph, loan amount exhibits a positive skew, mean 8337 > median 6500 but it doesn’t show count values between 20,000 and 30,000 loan amount. For the second graph, I took log10 of the y axis to see the loan original amount greater than 20,000. There are loans that are greater than 30,000. The maximum amount is 35,000 in this data set. Notice that there is gap surrounding 30,000. Perhaps for the loan amounts that are closer to 30,000 (like 29,000 and 31,0000), the number of those loans are very small.
Monthly Loan Payment exhibits a positive skew, mean 273 > median 218.
Most of the loans are 3 year (36 months) loans.
This shows the amount of loans offered for years of 2005-2014. There was an increase from 2006-2008 but dropped in 2009. According to Wikipedia, prosper closed the website for six months from October 2008 to July 2009. At that time, no one could not take out any loans which contributes to the dramatic decrease in 2009. After 2009, the number increased until 2013.
Monthly Income of the borrowers exhibit a positive skew mean 5600 > median 4666. The monthly income that are less than 5,000 has the highest frequency.
## $0 $1-24,999 $100,000+ $25,000-49,999 $50,000-74,999
## 621 7274 17337 32192 31050
## $75,000-99,999 Not displayed Not employed
## 16916 7741 806
Vast majority of the borrower’s income is between $25,000-74,999.
Lender Yield (interest rate minus servicing fee) exhibits a normal distribution, mean 0.18 ~ median 0.17.
The Estimated Return exhibits a normal distribution, mean 0.1 ~ median 0.1. Most loans have an estimated return between 0.05 and 0.1.
A lot of the loans have estimated loss between 0 and 0.1. There is a gap before 0.1.
Credit ranges are normally distributed with mean = 685 and median 680. I will only use Credit Score Range Lower throughout this analysis. Credit Score Range Higher has a very similar distribution but 20 points higher.
Debt to Income Ratio peaks near 2.5, and decreases after 2.5. Mean is 0.28. I wonder if debt to income ratio varies by loan categories(check in bi variate section).
The Prosper Rating of C have the highest total amount of loans. From the Prosper Website, each Prosper Rating represents an average annualized loss rate range. One possibility is that C loans could be in higher demand compared to other prosper loans. It’s possible that investors invest in loans in more than one rating to build their investment portfolio to gain more returns and balance out the loans.
## # A tibble: 2 x 2
## `InvestmentFromFriendsCount > 0` n
## <lgl> <int>
## 1 FALSE 111806
## 2 TRUE 2131
The vast majority of loans are not funded by friends of the borrowers. Mean is 0.02. 2,131 loans have some investments from friends of the borrowers. Out of those loans, majority of them have 1-2 friends who made the investment which is shown in the second graph (the y axis is transformed with log10).
Employment duration exhibits a positive skew mean 96 > median 67. It looks like more people that are employed in shorter duration.
Debt consolidation loans is the leading loan category in number of loans . I wonder what loans the borrowers are consolidating.
## # A tibble: 21 x 4
## ListingCategory sum_investors loan_amount_total n
## <fctr> <int> <int> <int>
## 1 N/A 1721298 106096621 16965
## 2 Debt Consolidation 4289693 577736197 58308
## 3 Home Improvement 648742 60148466 7433
## 4 Business 864554 64175191 7189
## 5 Personal Loan 245991 10913226 2395
## 6 Student Use 68751 2657014 756
## 7 Auto 175390 12861665 2572
## 8 Other 752000 62035775 10494
## 9 Baby&Adoption 13739 1940528 199
## 10 Boat 6070 742400 85
## # ... with 11 more rows
I created a table of the loan listing categories with the total number of investors and total amount of loans. Debt Consolidation have the highest amount of investors in addition to the number of loans. It seems there are a lot of loans not classified.
There are 11,937 loans and 81 variables in this data set. The variables I will be focusing on will be investors, listing category, lender’s yield, loan year, term, debt to income ratio, credit score, prosper rating(alpha and numeric) and estimated return.
Here’s what I observed so far:
There are a lot of features of interest in my data set. The main feature of interest is investors.
listing category, credit scores, lender’s yield, estimated return, and loan status will help.
Yes. I created loan year (the year the loan was listed) and listing category (category of the loan).
No.
Let’s check the average amount of investors per term loan. 3 year loans have the highest average of investors. Since most loans are 3 years, it’s not surprising that 3 year loans have the highest average amount of investors. Notice that 5 year loans have the widest range in comparison to 1 and 3 year loans in the box plot graph.
5 year term have the highest median lender yield. However, 3 year term have the widest range of lender’s yield.
It seems that the debt to income ratio is the similar for all loan categories.
Credit Score of 700+ has the highest amount of Investors. Perhaps more investors are willing to invest borrowers whose credit score is 700 or higher from the regression line.
From the regression line, as credit score increases, the lender yield decreases.
Most investors have an estimated return between 0 and 0.1 from the loans.
Credit Score Range is similar across all loan listing categories. A few categories have credit score above 800 like Debt Consolidation, Auto, and Household expenses.
Borrower rate correlates to lender yield. As borrower rate increases, the lender’s yield increases.
The number of investors increases as prosper score increases. More investors are wiling to fund the loans that are that have prosper score above 7.
## ProsperRating..Alpha.: A
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 1.00 51.00 97.53 162.00 1189.00
## --------------------------------------------------------
## ProsperRating..Alpha.: AA
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 62.75 150.00 178.37 274.00 1035.00
## --------------------------------------------------------
## ProsperRating..Alpha.: B
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.0 1.0 19.0 69.8 112.0 856.0
## --------------------------------------------------------
## ProsperRating..Alpha.: C
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 1.00 9.00 51.09 78.00 1024.00
## --------------------------------------------------------
## ProsperRating..Alpha.: D
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 4.00 35.00 56.23 87.00 511.00
## --------------------------------------------------------
## ProsperRating..Alpha.: E
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 2.00 25.00 35.01 55.00 279.00
## --------------------------------------------------------
## ProsperRating..Alpha.: HR
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 9.00 31.00 35.28 54.00 237.00
The highest amount of investors are investing in lower risk loans where prosper rating is A than higher risk loans like HR. The Prosper Rating AA has the highest amount of investors. Let’s check if the lender yield correlates to the risk of loans.
HR (highest risk) have the highest lender’s yield while AA (lower risk) have the lowest average lender’s yield. Let’s check on the rating in relation to debt to income ratio.
Similar to the previous plot, HR has the highest debt to income ratio while AA has the lowest. Does that mean borrowers in HR category are more defaulted loans?
Even though HR has the highest debt-to-income ratio, it does not have the highest number of loans defaulted. C loans have the highest number of defaulted loans which is unexpected. Is there a correlation between debt-to-income ratio and number of days the loan is delinquent?
##
## Pearson's product-moment correlation
##
## data: loan$DebtToIncomeRatio and loan$LoanCurrentDaysDelinquent
## t = 17.856, df = 105380, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.04890099 0.06093975
## sample estimates:
## cor
## 0.05492236
The correlation is 0.05 which tells me that there is very little positive correlation. The scatter graph shows the lower debt to income ratio has the highest amount of days the loan is delinquent. There is no relationship between the two. Is there a correlation between the prosper rating (risk) and number of days the loan is delinquent?
##
## Pearson's product-moment correlation
##
## data: loan$ProsperScore and loan$LoanCurrentDaysDelinquent
## t = -11.404, df = 84851, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.04583710 -0.03240077
## sample estimates:
## cor
## -0.0391207
The correlation is -0.04 which tells me there is very little negative correlation between risk and days that the loan is delinquent. The graph shows very little relationship between the two.
The graph above shows the number of investors are investing loans per year. There was an increase number of investors in 2006-2008 but the number decreased drastically in 2009. On November 2008, a class lawsuit was filed against Prosper in California. The investors alleged that Prosper offered and sold unqualified and unregistered securities in the violation of the California and federal securities laws. The lawsuit was settled on July 2013 (source:https://en.wikipedia.org/wiki/Prosper_Marketplace). Prosper closed the website for six months from October 2008 to July 2009. The number of investors dropped in 2009 due to the temporarily closed prosper website and the lawsuit.
Investors strongly correlate to the risk of the loans (prosper rating alpha).
Investors also correlate with credit scores. It seems like they are more willing to invest of credit scores of 700+.
The highest number of investors are enrolled in 2013. There was a drop of investors in the year of 2009.
The risk of the loan does not indicate whether the loans will be in default. HR has the highest debt-to-income ratio but C has the highest number of loans in default.
Another interesting relationship I found was that credit scores does not correlates to lender’s yield. Investors gain the same amount of returns across variations of risks.
The strongest relationship I found is that the higher the risk of the loan, the higher debt to income ratio will become.
The graph above show the number of investors per year by the term of the loan. From the bi variate section, we saw that the number of investors dropped drastically in 2009 due to the temporary closing and the lawsuit. In this graph, the 36 term loans have the highest amount of investors in 2009. However, the 12 month and 60 month term loans don’t have any investors until 2010. Maybe 12 and 60 month term loans were not offered prior to 2010 could be a possibility.
From this graph, the higher number of investors invest in loans with a high prosper score and credit score higher than 700. Notice that there is a dot, prosper score of 7 athat lies near 1200 grid line. There are about 1200 investors who invest in a loan with a prosper score of 7 for a credit score between 750 and 800.
From this graph, the estimated return are highest with the debt consolidation category (listing category 1). Perhaps this is why, debt consolidation loans have the highest number of investors.
This graph shows the estimated return buckets for each term loan for the amount of investors. 3 year term has all buckets of estimated return while 1 year and 5 year term have only (0, 0.1) and (0.1-0.2) buckets. There is a small box for the 1 year term but there is no indication of which bucket it is (0.2-0.3 or NA).
The number of investors are highest with the credit score higher than 700 and a high prosper score(like higher than 7). They are more willing to invest in loans that are low risk and the corresponding borrower’s credit score is very good.
That the 36 month term loans have the highest number of investors in 2009 even though in the loan vs. investor graph in the bi variate section, 2009 has the lowest number of investors.
I didn’t create any models in this data set.
This is a box plot of the number of investors per Prosper Rating by letter. Prosper rating is a loan risk score from low risk (A) to high risk (HR). This box plot shows that the loan prosper rating of AA has the highest 3rd quantile and highest average number of investors. Investors are more willing to invest in low risk loans than high risk loans.
This is a scatter plot of about number of investors by credit scores fill in by prosper score (numeric). Keep in mind that the higher the prosper score, the lower the risk of the loan. This map shows the highest amount of investors are in the area where prosper score is higher 7.0 and the credit score is higher than 700+. Notice that there is a dot, prosper score of 7 athat lies near 1200 grid line. There are about 1200 investors who invest in a loan with a prosper score of 7 for a credit score between 750 and 800. From this graph, investors are more likely to invest in loans where the risk is low and the borrower’s credit score is high.
This graph shows the number of investors who invest in 12, 36, and 60 month term loans by year. 36 month term loans have the highest number of investors in 2009 even though Prosper had a lawsuit and their website was temporarily shutdown in the same year. The number of investors for 12 and 60 month term loans appear from 2010 onward. Notice that 60 month (5 year) term has the highest amount of investors in 2010. There is no boxplot for 2014 for all terms. Only outliers for 36 and 60 month terms. One possibility is that the year of the listings stops mid 2014 not the end of the year.
When I started exploring the data set, I was confused on what to do analysis on. There were 81 variables and it was hard for me to choose what variables would be useful and which variable could be my main focus. I didn’t know what to look for in this data set. I started to explore without any expectations. But I was more confused. I decided to focus on the investors of this data set out of the blue. The analysis was so much smoother.
I explore what factors investors are willing to invest in loans. So far, I discovered is credit score of borrower, risk of the loan, and loan category from the loan listings in the data set. The problem is every observation is a loan, not an investor. There is a possibility that investors invest in more than one loan. I would like to further my analysis on a data set than contain investors as an observation and the loans they are investing. I also want to find out how much returns they earn each year from their portfolio. Which portfolio yields the highest returns? Which portfolio yields the lowest returns?